Online Residential Demand Response via Contextual Multi-Armed Bandits
نویسندگان
چکیده
Residential loads have great potential to enhance the efficiency and reliability of electricity systems via demand response (DR) programs. One major challenge in residential DR is how learn handle unknown uncertain customer behaviors. In this letter, we consider problem where load service entity (LSE) aims select an optimal subset customers optimize some performance, such as maximizing expected reduction with a financial budget or minimizing squared deviation from target level. To behaviors influenced by various time-varying environmental factors, formulate contextual multi-armed bandit (MAB) problem, develop online learning selection (OLS) algorithm based on Thompson sampling solve it. This takes information into consideration applicable complicated settings. Numerical simulations are performed demonstrate effectiveness proposed algorithm.
منابع مشابه
Contextual Multi-Armed Bandits
We study contextual multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a contextual multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses, based on a given context (side information), an action from a set of possible actions ...
متن کاملContextual Multi-armed Bandits under Feature Uncertainty
We study contextual multi-armed bandit problems under linear realizability on rewards and uncertainty (or noise) on features. For the case of identical noise on features across actions, we propose an algorithm, coined NLinRel, having O ( T 7 8 ( log (dT ) +K √ d )) regret bound for T rounds, K actions, and d-dimensional feature vectors. Next, for the case of non-identical noise, we observe that...
متن کاملA Survey on Contextual Multi-armed Bandits
4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...
متن کاملThe Epoch-Greedy Algorithm for Contextual Multi-armed Bandits
We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales asO(T S) or better (sometimes, much better). Here S is th...
متن کاملExponentiated Gradient LINUCB for Contextual Multi-Armed Bandits
We present Exponentiated Gradient LINUCB, an algorithm for contextual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Control Systems Letters
سال: 2021
ISSN: ['2475-1456']
DOI: https://doi.org/10.1109/lcsys.2020.3003190